首页> 外文OA文献 >Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation
【2h】

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation

机译:通过结合支持向量机和PSSM距离转换识别DNA结合蛋白

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. Results: We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. Conclusions: The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/.
机译:背景:DNA结合蛋白在从DNA复制到基因表达控制的各种细胞内和细胞外活动中起着关键作用。 DNA结合蛋白的鉴定是基因组注释领域的主要挑战之一。文献中提出了几种计算方法来处理DNA结合蛋白的鉴定。但是,它们大多数不能为我们对DNA-蛋白质相互作用的理解提供宝贵的知识基础。结果:我们首先提出了一种新的蛋白质序列编码方法,称为PSSM距离变换,然后将PSSM距离变换与支持向量机(SVM)结合,构建了DNA结合蛋白鉴定方法(SVM-PSSM-DT)。首先,通过使用PSI-BLAST程序搜索非冗余(NR)数据库来生成PSSM配置文件。接下来,通过距离转换方案将PSSM轮廓适当地转换为统一的数字表示形式。最后,将得到的统一数值表示形式输入到SVM分类器中进行预测。因此,可以确定序列是否可以结合DNA。在使用折刀验证对525种DNA结合蛋白和550种非DNA结合蛋白进行的基准测试中,本模型的ACC为79.96%,MCC为0.622,AUC为86.50%。该性能大大优于大多数现有的最新技术。在新近构建的独立数据集PDB186上进行测试时,SVM-PSSM-DT还以80.00%的ACC,0.647的MCC和87.40%的AUC达到了最佳性能,并且优于某些现有的最新方法。结论:实验结果表明PSSM距离转换是一种可用的蛋白质序列编码方法,而SVM-PSSM-DT是鉴定DNA结合蛋白的有用工具。构建了一个用户友好的SVM-PSSM-DT网络服务器,公众可以在http://bioinformatics.hitsz.edu.cn/PSSM-DT/上的网站上免费使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号